Traditional multilingual neural machine translation (MNMT) uses a single model to translate all directions. However, with the increasing scale of language pairs, simply using a single model for massive MNMT brings new challenges: parameter tension and large computations. In this paper, we revisit multi-way structures by assigning an individual branch for each language (group). Despite being a simple architecture, it is challenging to train de-centralized models due to the lack of constraints to align representations from all languages. We propose a localized training recipe to map different branches into a unified space, resulting in an efficient detachable model, Lego-MT. For a fair comparison, we collect data from OPUS and build the first large-scale open-source translation benchmark covering 7 language-centric data, each containing 445 language pairs. Experiments show that Lego-MT (1.2B) brings gains of more than 4 BLEU while outperforming M2M-100 (12B) (We will public all training data, models, and checkpoints)
translated by 谷歌翻译
Logical reasoning of text is an important ability that requires understanding the information present in the text, their interconnections, and then reasoning through them to infer new conclusions. Prior works on improving the logical reasoning ability of language models require complex processing of training data (e.g., aligning symbolic knowledge to text), yielding task-specific data augmentation solutions that restrict the learning of general logical reasoning skills. In this work, we propose APOLLO, an adaptively pretrained language model that has improved logical reasoning abilities. We select a subset of Wikipedia, based on a set of logical inference keywords, for continued pretraining of a language model. We use two self-supervised loss functions: a modified masked language modeling loss where only specific parts-of-speech words, that would likely require more reasoning than basic language understanding, are masked, and a sentence-level classification loss that teaches the model to distinguish between entailment and contradiction types of sentences. The proposed training paradigm is both simple and independent of task formats. We demonstrate the effectiveness of APOLLO by comparing it with prior baselines on two logical reasoning datasets. APOLLO performs comparably on ReClor and outperforms baselines on LogiQA.
translated by 谷歌翻译
kNN-MT presents a new paradigm for domain adaptation by building an external datastore, which usually saves all target language token occurrences in the parallel corpus. As a result, the constructed datastore is usually large and possibly redundant. In this paper, we investigate the interpretability issue of this approach: what knowledge does the NMT model need? We propose the notion of local correctness (LAC) as a new angle, which describes the potential translation correctness for a single entry and for a given neighborhood. Empirical study shows that our investigation successfully finds the conditions where the NMT model could easily fail and need related knowledge. Experiments on six diverse target domains and two language-pairs show that pruning according to local correctness brings a light and more explainable memory for kNN-MT domain adaptation.
translated by 谷歌翻译
Entities, as important carriers of real-world knowledge, play a key role in many NLP tasks. We focus on incorporating entity knowledge into an encoder-decoder framework for informative text generation. Existing approaches tried to index, retrieve, and read external documents as evidence, but they suffered from a large computational overhead. In this work, we propose an encoder-decoder framework with an entity memory, namely EDMem. The entity knowledge is stored in the memory as latent representations, and the memory is pre-trained on Wikipedia along with encoder-decoder parameters. To precisely generate entity names, we design three decoding methods to constrain entity generation by linking entities in the memory. EDMem is a unified framework that can be used on various entity-intensive question answering and generation tasks. Extensive experimental results show that EDMem outperforms both memory-based auto-encoder models and non-memory encoder-decoder models.
translated by 谷歌翻译
视频中的战斗检测是当今监视系统和流媒体的流行率的新兴深度学习应用程序。以前的工作主要依靠行动识别技术来解决这个问题。在本文中,我们提出了一种简单但有效的方法,该方法从新的角度解决了任务:我们将战斗检测模型设计为动作感知功能提取器和异常得分生成器的组成。另外,考虑到视频收集帧级标签太费力了,我们设计了一个弱监督的两阶段训练计划,在此我们使用在视频级别标签上计算出的多个实体学习损失来培训得分生成器,并采用自我训练的技术以进一步提高其性能。在公开可用的大规模数据集(UBI-Fights)上进行了广泛的实验,证明了我们方法的有效性,并且数据集的性能超过了几种先前的最先进的方法。此外,我们收集了一个新的数据集VFD-2000,该数据集专门研究视频战斗检测,比现有数据集更大,场景更大。我们的方法的实现和拟议的数据集将在https://github.com/hepta-col/videofightdetection上公开获得。
translated by 谷歌翻译
知识密集型任务,例如开放域问题答案(QA),需要访问大量的世界知识或领域知识。知识密集型任务的一种常见方法是采用检索到阅读的管道,该管道首先从诸如Wikipedia之类的外部语料库中检索少数相关的上下文文档,然后预测在检索文档的条件下得到答案。在本文中,我们提出了一种新的观点,可以通过用大型语言模型生成器代替文档检索器来解决知识密集型任务。我们称我们的方法生成-Read Read(GenRead),该方法首先提示大型语言模型根据给定问题生成上下文文档,然后读取生成的文档以产生最终答案。此外,我们提出了一种基于聚类的提示方法,该方法选择了不同的提示,从而产生了涵盖不同观点的生成文档,从而更好地回忆了可接受的答案。我们对三个不同的知识密集任务进行了广泛的实验,包括开放域质量检查,事实检查和对话系统。值得注意的是,GenRead在Triviaqa和WebQ上实现了71.6和54.4的精确匹配分数,显着超过了最先进的检索到+4.0和+3.9的最先进的dpr-fid,而无需从任何外部知识源中检索任何文档。最后,我们证明可以通过结合检索和生成来进一步提高模型性能。
translated by 谷歌翻译
最近,类似于MLP的视觉模型已在主流视觉识别任务上实现了有希望的表演。与视觉变压器和CNN相反,类似MLP的模型的成功表明,令牌和渠道之间的简单信息融合操作可以为深度识别模型带来良好的表示能力。但是,现有的类似于MLP的模型通过静态融合操作融合代币,缺乏对代币内容的适应性。因此,习惯信息融合程序不够有效。为此,本文介绍了一种有效的MLP式网络体系结构,称为Dynamixer,诉诸动态信息融合。至关重要的是,我们提出了一个程序,该过程依赖于该过程,以通过利用混合所有令牌的内容来动态生成混合矩阵。为了减少时间复杂性并提高鲁棒性,采用了降低性降低技术和多段融合机制。我们提出的Dynamixer模型(9700万参数)在没有额外的训练数据的情况下,在Imagenet-1k数据集上实现了84.3 \%TOP-1的精度,对最先进的视觉MLP模型表现出色。当参数数量减少到26m时,它仍然可以达到82.7 \%TOP-1的精度,超过了具有相似容量的现有MLP样模型。该代码可在\ url {https://github.com/ziyuwwang/dynamixer}中获得。
translated by 谷歌翻译
介绍了一种名为VMagent的新型模拟器,以帮助RL研究人员更好地探索新方法,特别是对于虚拟机调度。VMagent由实用虚拟机(VM)调度任务的启发,并提供了一个有效的仿真平台,可以反映云计算的实际情况。从实际云计算结束了三种情况(衰落,恢复和扩展),对应于许多强化学习挑战(高维度和行动空间,高于寿命和终身需求)。VMagent为RL研究人员提供了灵活的配置,以设计考虑不同的问题特征的定制调度环境。从VM调度角度来看,VMagent还有助于探索更好的基于学习的调度解决方案。
translated by 谷歌翻译
医学图像分割是基于人工智能的临床决策系统的基本问题之一。目前的自动医学图像分割方法往往未能满足临床要求。因此,提出了一系列交互式分段算法来利用专家校正信息。然而,现有方法在长期互动之后遭受一些分割炼制失败问题,以及来自专家注释的一些成本问题,这阻碍了临床应用。本文通过引入纠正措施评估,提出了一种互动分割框架,称为交互式医疗细分,通过引入纠正措施评估,该纠正措施评估结合了基于动作的置信度学习和多智能体增强学习(Marl)。通过新颖的基于行动的置信网络建立评估,并从Marl获得纠正措施。基于机密信息,旨在提供更详细的反馈,并在无监督数据上提出模拟标签生成机制,以减少对标记数据的过度依赖性的模拟标签生成机制。各种医学图像数据集的实验结果显示了所提出的算法的显着性能。
translated by 谷歌翻译
以前的研究,将一般神经计算机翻译(NMT)模型调整为特定域通常忽略同一域内的翻译中的分集,这是真实情景中域适应的核心问题。这种具有挑战性的情景的一个代表是部署与特定主题的会议的翻译系统,例如全球变暖或冠状病毒,因为时间表通常存在极低的资源。为了激励在这种情况下更广泛的调查,我们在机器翻译(Flgada)中展示了一个真实的细粒度域适应任务。 Flgada DataSet由汉英翻译任务组成,用于信息技术的四个子域:自治车辆,AI教育,实时网络和智能手机。每个子域都配备有开发集和测试集以进行评估目的。为了更接近现实,Flgada不采用任何域名双语培训数据,但提供双语词典和Wiki知识库,这可以在短时间内更容易获得。我们基准于细粒度的域适应任务,并显示深入的分析,表明存在仍然有挑战性的问题,以进一步提高异构资源的性能。
translated by 谷歌翻译